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Abstract : In this paper we introduce a notion of causal transference plans on Polish spaces. These plans 
are probabilities on the product space which generalize the adapted processes of stochastic calculus in the 
same way as the Monge-Kantorovich transference plans generalize the Monge transference plans. We provide 
a detailed study of their main properties. Then we introduce the associated causal transport problems and 
we prove a very general result of existence of solutions to the causal Monge-Kantorovich problems. Finally 
we relate these problems to stochastic optimal control, and we investigate the transports of the Wiener 
measure for the quadratic cost. Weak solutions to some stochastic differential equations whose solutions 
can be obtained by transformation of the drift then appear as optimal transference plans to these causal 
Monge-Kantorovich problems, while the existence of a unique strong solution to these equations is related 
to the existence of an optimum of Monge. In this case, the causal counterpart of the Wasserstein distance 
is the square root of the relative entropy. 
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1. Introduction 

Over the last decade a striking analogy between stochastic control and optimal transport has 

been investigated by T.Mikami and M.Thieullen (see [29], [28], [30], [31]) in close connexion to 

stochastic mechanics (see [H]). Recently these ideas have also received a growing interest through 

their implications in financial mathematics (for instance see [35] . [6] and the references therein). 
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Moreover similar problems where considred in |21] and [23] . where an analogy between optimal 
transport and stochastic differential equations also appeared. However, while optimal transport 
focusses on transference plans on a product space, the couplings involved in the above papers are a 
mixture of several conditions which prevent to get a clear intuition of the real origin of this analogy. 
The present paper is aimed at enlightening this latter by providing a notion of causal transference 
plans on the product space which generalize the adapted processes of stochastic calculus, and by 
investigating the associated Monge-Kantorovich problems. Although our definition is given on Polish 
spaces, in the case of the paths spaces of stochastic calculus, we shall see that the associated causal 
couplings are usually handled in stochastic optimal control and in the study of stochastic differential 
equations. In this case, causal transference plans provide the analytic formulation of these couplings 
which was missing in the literature, thus yielding general and efficient proofs. In particular this 
formulation is particularly suitable to investigate the compactness of families of such couplings. 
The structure of this paper is divided in two parts. In the first one (Section [2] to Section [5]) we 
provide the general definition of causal transference plans, we study their topological properties, and 
we investigate the associated causal Monge-Kantorovich problems. In the second part (Section [5] 
to Section [8]) we relate these problems to stochastic optimal control and to stochastic differential 
equations. We found this definition of causal transference plans hidden between the lines of the 
proof of the Yamada-Watanabe criterion for stochastic differential equations on the Wiener space. 
Within this context the causal transference plans which will be introduced in Section[3]generalize the 
adapted processes exactly in the same way as the Monge-Kantorovich transference plans generalize 
the Monge transference plans by allowing the mass to be splitted during the transport. This task 
is achieved thanks to the conditional probability kernel of a probability on the product space with 
respect to its first marginal. Although we found our motivations in Mikami's works and our first 
applications on the Wiener space, the first part of this paper is written in the general framework of 
two Polish spaces E and S endowed with filtrations (J c " t £ ')tG[o,i] an d (Ff)te[o,i] 01 their Borel sigma- 
field B(E) and B(S) i.e. these filtrations satisfy for any t € [0, 1], Tf C B{E) and 7f C B(S). There 
are essentially two reasons why we have chosen to present these results in this general framework. 
The first one is that this framework encompasses most of the potential applications as well in finite 
dimension, as in infinite dimensions, that it holds for any Borel probability, and that it may be 
applied as well to the continuous processes as to the jump processes. Indeed we recall that, for 
instance, the space D([0,l],M. d ) of the cad-lag processes is turned into a Polish space when it is 
endowed with the corresponding Skorokhod topology. As a matter of fact, within this setting, our 
analytic formulation to these problems even applies to point processes. The second reason is that 
these problems may be of interest for a reader working on optimal transport who would seek to add 
a physically relevant time constraint in his models. As a consequence, reading the first part of this 
paper does not require any prior knowledge in stochastic calculus. Within this general framework 
the notion of adapted process can still be defined for measurable mappings of the form U : E — > S 
by means of the inverse image of a filtration as it will be recalled with full details in the preliminary 
Section[2] Specifically we focus on the case where an adapted mapping transports a Borel probability 
measure r\ of E to a Borel probability measure v of S. Then we generalize these adapted mappings 
as being some probabilities 7 on the product space E x S. To be concrete and to justify why we 
say these plans to be causal, we stress here that as it will be stated accurately in Definition [TJ any 
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transference plan 7 with marginals rj and v induces a filtration {Gt(l))t£[o,i} on the space E. Roughly 
speaking, at a given t € [0, 1] the sigma- field Qtil) represents the information that the shipper needs 
to know among B{E) (which may be thought of as being the whole information contained on the 
initial space) to be able to build the distribution v\jrs (the restriction of the distribution v to the 
sigma-field J" t s ) which may be seen informally as the distribution v such as it appears at time t. A 
causal transference plan then appears as a transference plan which can be realized dynamically at 
each t by means of the information available on the first space at time t. The information which 
appears on E after a time t is not involved to realize the plan at time i, which expresses the notion 
of causality of physics. By taking filtrations constant equal to their Borel sigma-fields, these plans 
encompass usual transference plans. We study their first properties by focussing on their topology 
in Section (?] Then, we define the associated transport problems and we prove the existence of a 
solution to the causal Monge-Kantorovich problems under very slight conditions (Theorem [3]). The 
study of the precise geometry of these optimal plans (in the sense of [15]) in the general case goes 
far beyond the purpose of this paper. However we give much more informations on this latter in the 
second part of this paper which is devoted to the causal optimal transports of the Wiener measure. 
In this second part we first show how the causal Monge-Kantorovich problems are involved in the 
problems of stochastic optimal control investigated by T.Mikami. Then we prove that under slight 
conditions, the joint law of the weak solutions to the stochastic differential equations of the form 

(1.1) dX t =dB t -v t oXdt 

can be seen as a Monge-Kantorovich optimum to the problems of causal transport of the Wiener 
measure with a quadratic cost, while the existence of a unique strong solution to these equations 
occurs if and only if the support of these optimal plans are concentrated on the graph of a strong 
solution. With other words, the existence of a unique strong solution to these equations is related 
to the geometry of the causal optimal transports. The equations of the shape (jl.ip encompass 
the Markovian diffusions investigated by Zvonkin (see [19], [52], [46] and also [27] for a recent 
contribution) as well as the famous Cirelson's equation (see[18|). Hence, we provide here a new way to 
investigate several important problems of stochastic differential equations which remain mysterious 
on many points, despite several decades of investigations. As we shall see it, the Malliavin derivative 
is essentially involved in the geometry of the optimal causal transference plans. Another interesting 
point which is worth to be noted is that in this case the square root of the relative entropy appears as 
being the causal counterpart to the Wasserstein distance. At least physically it is quite satisfactory 
to see that the entropy appears as one introduces the arrow of time. 

The structure of this paper is the following. The first part of the paper (Section [2} Sect ion [5]) pro- 
vides the notion of causal transference plans and investigates the causal counterpart of the Monge- 
Kantorovich problems on Polish spaces. In Section[2]we fix the main notations and definitions which 
will be used in the whole paper. In particular we recall how a filtration and a notion of adapted 
process can still be defined naturally in the general framework of two Polish spaces. In Section [3] 
we introduce the notion of causal transference plans (Definition [2]), in such a way that it generalizes 
the adapted processes usually handled in stochastic control and in stochastic calculus. We also fully 
characterize the causal couplings (i.e. the realization of a causal transference plan by a pair of map- 
pings defined on a same probability space.) in Proposition [1] In Section [J] we study the topological 
properties of some interesting subsets of causal transference plans. The main result of this section 
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is the Theorem [5] which gathers the main topological properties of these subsets. The Section [5] in- 
troduces the problems of optimal transport for causal transference plans, namely the causal Monge 
problems and the causal Monge-Kantorovich problems (Definition [2]) and we prove the existence of 
an optimal Monge-Kantorovich transference plan under very weak assumptions (Theorem [3]) . The 
second part of this paper (Section [6]-Section [8]) investigates the causal optimal transports of the law 
of the standard Brownian motion. In Section [S] we characterize the causal couplings whose first mar- 
ginal is the Wiener measure and we show how the causal Monge-Kantorovich problems of Section [5] 
are related to the stochastic optimal control problems investigated by Mikami. In this case the causal 
transport plans appear implicitly in the lines of the proof of the Yamada-Watanabe. Furthermore 
any solution (X, B) to a stochastic differential equation is a causal coupling. In Section [5] we go 
one step further by investigating the causal optimal transports of the Wiener measure for the qua- 
dratic cost function. Namely, we prove that under slight conditions the joint laws of weak solutions 
to the generally non-Markovian stochastic differential equations of the shape are solutions to 
the causal Monge-Kantorovich problems of Section [5] (Theorem |4} , while the existence of a unique 
strong solution happens if and only if this optimal transference plan is supported by the graph of a 
solution to the associated causal Monge problem (Corollary [2]). We also provide a dual formulation 
(Corollary |4]) which is very similar to the dual formulation in the non-causal case. In Section [7] 
we recall the basic facts about Wiener space and about the transformations of the Wiener measure 
which we use in Section [3] In particular we define the Girsanov shifts and we recall how to compute 
it explicitly by means of Malliavin calculus. 

2. Preliminaries and notations 

In the whole paper, E and S will denote two Polish spaces whose Borel sigma-fields are noted 
respectively B(E) and B(S). The set of the Borel probability measures on E (resp. on S) will 
be denoted by V(E) (resp. by V(S)) while the set of the probabilities on E x S endowed with 
the sigma-field B(E) ® B(S) will be denoted by V(E x S). Given two probabilities 77 <E V{E) and 
v G V(S), by a morphism of probability spaces between 77 and v we mean any Borel measurable 
mapping 

U : E^ S 

which is defined 77 almost everywhere in such a way that the direct image of the probability 77 under 
U is v which we write 

U*r) = v 

We will denote by lZ(r), v) the set of the morphisms of probability spaces between rj and v. Informally 
any U € ~R-{ri, v) may be seen as a way to transport the probability 77, which may be thought of as 
being a distribution of mass, to the probability v, by carrying all the mass which is located at a 
given w to a unique uj which is given by U(uS). L a (j], S) will denote the set which is obtained by 
identifying the Borel measurable mappings U : E — > S which are rj — a.s. equal. A relaxed notion of 
transport can be introduced by means of the product space. To handle with this latter, we need to 
introduce the projections it (resp. n) on the first (resp. second) coordinate of E x S, that is 

7r : (u), w)eBx5->w££ 
7T : (w, ui) £ E x S ->• uj e S 
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The set of the transference plans between rj and v is then the set 11(77, v ) °f the probabilities on 
E x S whose first (resp. second) marginal is 77 (resp. v) which reads 

(2.2) 1%, ^{ 7 e V(E x 5)|vr i7 = 77, ^7 = v} 

On the other hand the set of all the transference plans of a given probability 77 will be noted 

(2.3) V s {ri):=U v&n s)^v) 

A straightforward fashion to understand why one may think to a given 7 G 11(77, v ) as a wav to 
transport the mass is to introduce the conditional probability kernel of 7 with respect to 77. As a 
matter of fact, these kernels will be the basic objects we will be interested in within the whole paper. 
Here, we just provide the definition and we refer to [33J Chapter I or to [TH] (Chapter 1 and p. 164) 
and the references therein for further details. Since E and S are Polish spaces, for any 7 € 11(77, v) 
there exists a unique function 

6 7 : (w, B) e E x B(S) -> Q%(B) G R 

which satisfies the following properties 

(i) for any u G E, 9^ e P(S) 

(ii) For any fixed B G B(5) the mapping 

uj eE^e^(B) G R 

is B(E) r >/B(R) measurable 

(iii) For any A G B(E), B G B(S) we have 

(2.4) 7 (AxB) = [ e^(B) V (du) 

J A 

A straightforward calculus shows that 77 — a.s. 

(2.5) O^(B) = 1 (n e B\tt = uj) 

Thus, roughly speaking 0"(c£5) represents the proportion of the mass at a given uj which is trans- 
ported to the point uj by the transference plan 7. Now, note that to any U G TZ(rj,v) (i.e. any 
morphism of probability spaces from 77 G V(E) to v G V(S)) we can associate the probability 7(7 
whose probability kernel 0" is given 77 — a.s. by the Dirac measure Sjj^) concentrated at U(w). This 
reads for any A G B(E) and £ G B(S) 

(2.6) 7c/ (^ x 73) = £„[l A fc,(.B)] := E V [1 A 1 B o 17] 

In the sequel for any two measurable mappings V : — > _E and X : fl — > S* defined on a probability 
space (Q,,A,V) we will note 

y x x ■. uj g n -> (y(w), eEx s 

Moreover whenever 77 := Y^P and := A^P we will say that (Y, X, (CI, A, V)) (or for short that 
(y X)) is a coupling of (77, v). With these notations we have 

(2.7) 7p - (1^ x f/),77 
where 
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denotes the identity map on E. For 77 G V(E) and ^ G 'P(S), the couplings (Y, X, (f2, ^4, P)) of 
(77, z/) such that (Y x X)+V is of the form (|2.7p are called the deterministic couplings of (77,2/). 
Deterministic couplings naturally model systems which answer in a deterministic way to a random 
input Y, while X can be seen as the output of the system. Indeed in this case by definition there 
exists a U G lZ(i],v) such that "P — a.s. X = U(Y). Furthermore in view of (|2.5[) we can write 
informally r/ — a.s. 

ju(tt G duj\ir = oj) = 8 uiyljj )(du>) 

which means that, as it is expected a deterministic coupling plan may be seen as a transference plan 
where all the mass at a given to G W goes almost everywhere to the same : it is not splitted and it 
is supported by the graph of a given morphism. Note that these notions of morphisms of probability 
spaces and of transference plans do not involve any filtration. Hence, even if realized on some path 
spaces, these transports are not constraints by a structure which models the causality. 

To introduce the arrow of time and to define adapted processes we assume henceforth and for 
the whole paper that a filtration (B*(£ ! ))te[o,i] (resp. (B t (S)) se [ ^) of the Borel sigma-field B{E) 
(resp. B(S)) is given once for all on E (resp. on S). Furthermore given a probability 77 G V(E) 
(resp. v G V(S), resp. 7 G V(Ex S)) the usual augmentation (see [4]) of the filtration (B t (E)) te [Q_i] 
(resp. of (St(5)) t6 [ 0i i], resp. (B t (E) <g) Bt(S)) te [o,i]) with respect to rj (resp. to v, resp. to 7) will 
be denoted by (J?) (resp. by (F?), resp. by (J^)). We also set B(Ef (resp. B(E) V ) to be the 
completion of B{E) (resp. of B(S)) with respect to 77 (resp. to v). We are now ready to recall how 
one may define adapted mappings within this general framework. Let (fl, A, V) be a probability 
space and let X : Q — > S be a mapping which is A/B{S) measurable. Then for any t e [0, 1] the 
inverse image 

(2.8) X-\B t {S)) 

is a sigma-field on (17, ^4) which increases with t so that it defines a filtration. In this paper we will 
call (X^ 1 (B{('5))) te f 0)1 i the filtration generated by X and we will note {G*) the usual augmentation 
of this filtration with respect to V. Hence, given a complete filtration (At) on (£l,A, V), X will be 
said to be (At)— adapted if and only if for any t G [0, 1] we have 

Qt c At 

To make this definition clearer, as well as for the applications below in this paper, we now introduce 
the space W = C([0, l],R d ) of the continuous R d valued paths p : t E [0, 1] — > M d whose coordinate 
process (W t ) is defined by (t,co) G [0, 1] x W -> W t (u) := co(t) € R d . By taking the space S = W 
and by choosing B t (W) := a(W s ,s < t), at any t € [0,1] the sigma-field given by (|2.8|) is equal 
to a(X s ,s < t). This means that the definition we use fits in the standard uses of the word. 
Turning back to the general case with 77 € V(E) and v G V(S), a morphism of probability spaces 
U G 7^.(77, v) will simply be said to be adapted (with no precision on the underlying filtration) if it 
is (Ft)— adapted i.e. if and only if for any t G [0, 1] we have 

Q v t c n 

and we note TZa^.v) (resp. L Q a (-q,S)) the subset of the adapted elements of TZ(r],i>) (resp. of 
L° (77, S)). Of course in the case S = W, this definition corresponds exactly to the notion of adapted 
continuous processes. 
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Another notion which will be of interest for the applications is the isomorphism of probability 
spaces and of filtered probability spaces. There exists several definitions to these concepts across 
the literature, but in this paper we will adopt the easiest. Namely, a morphism U € 7^.(77, v) will 
be said to be an isomorphism of probability spaces between 77 £ ^(E) and v £ V(S) if there is a 
V £ 7£(z/, 77) such that 77 — a.s. 

(2.9) VoU = I E 
and v — a.s. 

(2.10) UoV = I s 

In the above formulas it is elementary to check that the pull-backs are well defined (for instance see 
[26]), that (|2.9p and (|2.10l) imply each other, and also that both of these conditions are equivalent 
to 

(I E X £/)*77 = (V X I s )*v 
Although we won't use it in the sequel we recall that thanks to Borel's isomorphism Jp : F — > 
[0, 1] any Polish space F can be endowed with at least a non trivial filtration by considering 
( J i ^ 1 (S([0, i]))) te [ The adapted counterpart to these isomorphisms are the isomorphisms of 
filtered probability spaces : we will say that U is an isomorphism of filtered probability space if U 
is an isomorphism with inverse V and both U and V are adapted i.e. we have both U £ lZ a {j],v) 
and V £ lZ a {i>,vi). In the latter case one speak of isomorphism of filtered probability spaces since 
the definition yields 

(qY) = m 

and 

(gY) = m 

In this paper such isomorphisms which map the filtration will only be used in the applications 
on the Wiener space. The particular case where it take values in the Wiener space plays a key 
role in stochastic analysis and in particular in Malliavin calculus. The essential reason is that 
any isomorphism of probability spaces with values in the Wiener space induces an isomorphism of 
Gaussian space (see [25] V while isomorphisms of filtered probability space enable to map directly 
the adapted quasi-invariant flows by inverse image, thus providing also a stochastic integral to the 
related space. A well known example of such isomorphisms is the Ito map of stochastic differential 
geometry (see [7], [17] or [3]). In the Section [7] of this paper we will meet other such isomorphisms 
with values in the Wiener spaces related to stochastic differential equations. 

3. Causal transference plans and their causal couplings 

We recall that E and S are two Polish spaces fixed once for all each one with a fixed filtration of 
their Borel sigma- field (see Section [5]) . We first motivate the definition of causal transference plans 
which is given in Definition [1] Then we fully characterize the causal couplings (Proposition [lj i.e. 
the realizations of a causal transference plan by a pair of processes defined on a same probability 
space. 

In order to introduce the Definition [T] naturally, and to enable the reader to get quicker an 
intuition, we now mention the property of transference plans associated to adapted processes. Let 
i] £ T , (E), v £ "P(S) and U £ TZ(r], v). According to the definition of Section [2] U is adapted if and 
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only if for any B £ B t (S), U 1 (B) £ T^. Returning to the transference plan j u associated to U 
by (|2.6p . we have 77 — a.s. 

©(WmOB) = <W)(5) = 1b ° U(u>) = lu-HB)^) 
which means that U is adapted if and only if for any t £ [0, 1] and for any B £ Bt(S) the mapping 
cj — > ©r/gxt/) 77 (-®) * s ^7 measurable. This latter property motivates Definition [T] 

Definition 1. Let 77 £ V(E), v £ 7>(S) and 7 € 11(77, ") (see fgJfy). For any t £ [0, 1] we note Gt{l) 
the smallest sigma-field on E such that for any C £ B t (S) the mapping 

is measurable, where 7 is the conditional probability kernel of 7 with respect to 77 (see Section^). 
We call {Gt(j))t£[o,i] the filtration generated by 7. A transference plan 7 £ 11(77, iy ) ^ e sa -id to be 
causal if and only if for any t £ [0, 1] 

Gt{i) c J? 

We note II C (?7, v) the set of the causal transference plans from 77 to v which is given by 

n c (ry, v) := {7 £ 1%, v)\it £ [0, 1] &( 7 ) C 7?} 

wMe i/ie sei 'P.f (77) 0/ f/ie causal transference plans on S from r\ is defined by 

V%{r}) :=U vens) IL c {r),v) 

Finally, a triplet (Y,X, (fi, A, V)) (or (Y, X) for short) will be called a causal coupling if and only if 
Y : 57 —¥ E and X : fl — > S are two measurable mappings (with respect to A/B(E) resp. to A/B(S)) 
defined on a same complete probability space (i},A,V) and (Y x X)^ £ Vf{Y*V\ 

Note that this definition means that 7 £ 11(77, v ) IS causal if and only if for any t £ [0, 1] and 
for any C £ Bt(S), the mapping ui £ E —> 8"(C) £ R is J 7 ? measurable. Thus in view of the 
preliminary remark, causal transference plans generalize the notion of adapted processes as it is 
announced it the introduction. We now characterize the causal couplings : 

Proposition 1. Let (CI, A, V) be a complete probability space and X : f) — > S and Y : 17 — > E be 

two mappings A/ B(S) (resp. A/B(E)) measurable. Further note (Gf) (resp. (GY) ) the filtration 
generated by X (resp. by Y) which is defined in Section^ Then the following assertions are 
equivalent: 

(i) (Fx X)+V £ TiciY+V , X+V) i.e. (Y,X) is a causal coupling (see Definition^ 
(ii) For any t £ [0, 1] and for any A £ Bt(S) we have V — a.s. 

V (X £ A\Gj) = V (X £ A\a{Y)) 

(Hi) For any t £ [0, 1] we have 

£aw(Y\a(G? U Gj)) = Caw{Y\Gj) 
Proof: Let 9 be the conditional kernel of (Y x X)+V with respect to Y+V. By definition we have 
(3.11) V(X £ A\a{Y)) = Y (A) 

for any A £ B(S) where 

V(X £ A\a(Y)) := E v [1 A o X\a(Y)] 
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Furthermore by setting 

v(x € A\g Y ) ■.= e v [i a o x\gj] 

for t £ [0, 1] we get V — a.s. 

(3.12) V(X e A\gj) = E v [Q Y (A)\g Y ] = EyMQ(A)\^ V ] ° ^ 
Hence from (|3 . 1 1 1) and (|3.12[) we see that V — a.s. 

(3.13) V(X e A\gJ) =V{X £ A\a(Y)) 
if and only if V — a.s. 

Q Y (A)=E Y ^[Q(A)\F YV ]oY 
which occurs if and only if Y+V — a.s. 

(3.14) Q(A) = E YtV [®{A)\Fj* v ] 

The equivalence between (|3.13[) and (|3.14l) implies that (i) <=> (ii). Moreover, for any C £ B t (S), 
D £ B t (E) and for any / £ Cb(E) we have 

Evlf o Y1 C M X > Y)] =Ep[fo Y1 D (Y)V(X £ C\a{Y))} 

while 

Ep[Ep[f o Y\gY]i CxD (x, y)] = E v [fo yi d (y)v(x £ c\g Y )] 

so that 

Ev\f ° Y\QJ] =Ev[fo Y\a{g Y U Q*)\ 
for any / £ Cb(E) if and only if for any C £ B t (S) 

V(X £ C\g Y ) = V(X £ C\a(Y)) 

which shows that (ii) is equivalent to (Hi). □ 

4. Topological properties of causal transference plans 

In this section we study the topology of the convex sets Pciv) an d n c (?7, v) introduced in Defi- 
nition [TJ Namely we prove these sets to be closed (resp. compact), while L°(rj, S) is embedded in 
V^(rf). These properties are gathered in Theorem [2] whose proof relies on the following lemma : 

Lemma 1. For n £ V(E), v £ V(S) let 7 £ U(r), v) (see (EM ) and let 

(ln)neN C V S (T]) 

be a sequence of probabilities where V s '(if) was defined by 112. 3\) . Then (j n ) converges 7 in the weak 
topology of probability measures if and only if for any Borel set A £ B(S) of v— continuity (i.e. 
v(dA) = OA 

@n(A) -^<j(L 1 (» ) ),L=°(r ) )) &(A) 

i.e. the sequence (Q n (A)) n £V\ converges to Q(A) in the weak topology ^(L 1 ^), L°°(r))) of L 1 ^), 
where (resp. for any n £ N O n ) is the conditional kernel 0/7 (resp. of j n ) with respect to r\. 
When this convergence occurs, and A is a set of v— continuity 

liminf Q n (A) < 0(A) < limsup6„(A) 
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moreover still in this case, we have that 9 n (A) — » 9(A) strongly in i 1 (77) (or equivalently in L (rj) 
since 9 n (A) £ [0, 1]) if and only if rj — a.s. 

9(A) < liminf 9 n (A) 

Proof: The sufficiency follows directly by definition. Indeed let A £ B(S) be a set of v continuity, 
and let B £ B(E) be a set of 77 continuity (i.e. v(dA) — T)(dB) = 0). By definition we have 

ln (B x A) = E n [l B 9 n (A)} 

and 

y(B x A) = E V [1 B 9(A)] 
Hence, the weak convergence 9 n (A) — > (T (L 1 (ri),L°'(ri) 9(A) directly yields 

lim j n (B x A) = j(B x A) 

n— >oo 

By the portmanteau theorem this ensures the convergence of j n to 7. Conversely to prove the 
necessity we now assume that 7„ — > 7, and we further consider any X £ L 00 ^). For any e > we 
have to prove the existence of a n e such that for any n > n e 

(4.15) \E n [X9 n (A)} - E V [X9(A)}\ < e 
To prove this, for e > 0, we choose a f e £ Cb(E) such that 

E n [\x~r\]< e - 

For instance such a f e can be constructed directly by applying Lusin's theorem and then Tietze's 
extension theorem. We then have 

\Er,[X9 n (A)} - E V [X9(A)]\ < \E V [(X - r)9 n (A)]\ + \E v [r(9 n (A) - 9(A))\\ + \E n [(X - / e )6(A)]| 

< 2E V [\X - r\] + \Er,[r(9 n (A) - 9(A))]\ 

so that 

(4.16) \E n [XQ n (A)] - E V [X9(A)]\ < | + \E,[f(e n (A) - 9(A))]\ 
On the other hand by definition 

E v [r(9 n (A) - 9(A))} = [ r(u)l A (u)d ln (u,u)- [ f(uj)l A (u)d 7 (uj,u) 

JExS JExS 

Hence by applying the portmanteau theorem we can find a n e such that for any n > n c 
(4-17) \E v [r(9 n (A)-9(A))}\<^ 

Substituting (|4.17[) into equation (14. 16|) shows that this n e satisfies (|4. 15|) which yields the desired 
convergence. We now turn to the last part of the claim i.e. we assume the convergence. By applying 
Fatou's Lemma together with the first part of the claim, we get that for any X £ L°°(r]) which is 
positive and for any set B £ B(S) of v— continuity : 

^[liminf 9 n (B)X] < liminf E v [9 n (B)X] = E V [9(B)X] 

which implies r/ — a.s. 

9{B) > liminf 9 n (B) 
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Similarly we obtain n — a.s. 

0(B) < limsup6„(S) 

Finally if ©„(£>) — > &(B) strongly in L 1 ^) we have &(B) < liminf Q n (B), while together with the 
weak convergence in L 1 ^), this latter condition is well known to imply the strong convergence (see 
g] Chapter I). □ 

The following is a short proof of the embedding of L°(rj) in V (rj) (see Section [2]), a fact which 
may be well known. 

Theorem 1. For any rj £ V(E) the canonical injection 

L°(n,S) ^V s (rj) 

is an embedding i.e. the mapping 

j:U£ L°(rj) -> (I E x C/)*j? £ V s (rj) 

defines an homeomorphism of L Q (rj) endowed with the topology of the convergence in probability onto 
its image endowed with the topology of the weak convergence in measure. 

Proof: Obviously j realizes a bijection onto its image. We first prove that it is continuous, then 
we prove that its inverse is also continuous. Let (U n ) C L°(r],S) be a sequence which converges 
in probability to a U £ L°(r/,S). By extracting a sequence which converges r/ almost surely, the 
dominated convergence theorem yields that for any / £ Cb(E x S) 

E m [/] := E (lExUhri [f] = E v [f(I E ,U)} = lim E„[f(I E ,U n )} = lim E j(Un) [f] 

i.e. j(U n ) — > j(U) and j is continuous. Conversely we now assume that we have a sequence 
(U n ) C L°(r], S) and a U £ L°(n, S) such that (j([/„)) nS R converges to j(C/), and we want to prove 
that (U n ) converges to (U) in probability. By definition j(U) and (for any n £ N) j(U n )) are 
elements of V (rj). Moreover, the conditional probability kernel of j(U) (resp. for any n £ N, of 
j(U n )) with respect to r\ is given by Sjj (resp. for n £ N by 5jj n ). In the above expression for uj £ E, 
denotes the Dirac measure on S concentrated at u>. Since by hypothesis (j(U n )) n( =fii C 'P s (?7) 
converges to j(U), the Lemma Q] implies that for any A £ B(S) which is a set of U+n continuity (i.e. 
U+r](dA) = 0) we have 

$U n (A) ^crfL 1 (77), L°° (n)) $u(A) 

in the weak topology cr(L 1 (?7), L°°(r])) of i 1 (?/). Hence for any set A,B£ B(S) of U^n continuity 
we have 

lim (U n x U)* V (A x B) = lim E v [8 Un (A)5 u (B)} = E v [S U (A)S U (B)} = (U x U)*r)(A x B) 

n— >oo n— ¥00 

so that ((U n x U)^r/) n£ fi converges to (UxU)+ri in the topology of the weak convergence of probability 
measures. On the other hand, let d s be a distance on S which is compatible with its topology. For 
any e > we set 

n e := {(a;, y) £ S x S : d s (x, y) > e} 

so that 

(U x I0*r/(a) - 
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and J7 e is a set of (U x U)^rj continuity (i.e. (U x U)+ri(dfl e ) = 0). By the weak convergence in 
measure of ((Z7 n x U)+r]) n £fi to (U x U)+r) we finally obtain for any e > 

lim n({uj£E: d s (U n , U) > e}) = lim ([/„ x C/)^(O e ) = (U x U),n(fl e ) = 

which means that (E/ n ) converges in probability to U. Hence j is a bijection onto its image which is 
continuous, with a continuous inverse. □ 



Theorem 2. For any r\ € V(E), v £ V(S) the following hold for the convex sets Vf (if) and II C (77, v) 
which were defined in Definition [7] ; 

(i) V^(rf) is convex and closed in V(E x S) for the topology of the weak convergence in measure, 
(ii) H c (n, v) is a not empty convex set which is compact in V(E x S) for the topology of the weak 
convergence in measure. 

(Hi) L®(r),S) is a closed subset of L (j], S) (see Section^) for the topology of the convergence in 

probability, which is canonically embe dded in V^(rj). 
(iv) lZ a (ri,i>) (see Section^ is a closed set for the convergence in probability, 
(v) Let (j n ) be a sequence of elements ofP^(rf) and further assume that (n+^f n ) is tight, then there 
exists a 7 € V^(rf) and a subsequence ("fk{n)) of (~y n ) which converges weakly in measure to 7 . 

Proof: The convexity in (i) and (ii) is trivial to see from the Definition [TJ We now show that 
the whole result relies on (i). Since the convergence in probability implies the convergence in law, 
TZ(r], v) is closed. Furthermore we have 72. a (77, v) = TZ(r], v) n L®(r}, S) so that (iv) follows from (Hi). 
On the other hand by Theorem [T] (Hi) follows from (i). Note that r\ ® v <E Ii c (rj,v) because its 
conditional kernel with respect to 77 is v. Moreover 11(77, v ) is we H known to be weakly compact (for 
instance see [3H] P-45) and II C (?7, v) — n(ry, v) f\Vf(rf) so that (ii) also follows from (i). Similarly, 
whenever (7r*7 n ) is tight, U„IT(r7, 7r*7„) is tight (see |48j p. 45). Hence we can extract a subsequence 
(lk(n) ) C Vciv) which converges to a 7. By continuity of ir±, 7 g V s (rf) so that (v) follows from 
(i). Hence we just have to prove (i). Before we begin the proof we first note that for any v £ V(S), 
proving that a probability 7 € n(r/, v) lies in the subset n c (r7, v) amounts to prove that for any 
t € [0,1] and for any A £ Bt(S) which is a set of v— continuity (i.e. v(dA) — 0) the mapping 
cj — > Q U1 (A) is measurable, where O is the conditional probability kernel of 7 with respect to n. 
To see this, let 7* be the probability on E x S with the sigma-field B(E) <g) Bt(S) whose action is 
defined on C £ B(E) and A £ B t (S) by 

lt (CxA) := E n [E 71 [®(A)\F?]l c ] 

Then 7 £ Tl c (r), v) is clearly equivalent to 

It = i\b(e)®bas) 

i.e. the restriction of 7 to B(E) <g> Bt(S). On the other hand the last inequality holds if and only if 
it holds on the sets of the form C x A where C £ B(E),A £ Bt(S) and A is a set of v continuity. 
Moreover this latter is equivalent to prove that for any A £ Bt(S) which is a set of v— continuity (i.e. 
v(dA) — 0) the mapping uj ~ > Q U '(A) is measurable, which shows that we can restrict ourselves 
to the sets of ^—continuity. We now turn to the main body of the proof. Let (7 n )neN C Vf(n) 
be a sequence which converges to a probability j £ V (E x S) in the weak topology of probability 
measures. By continuity of ir+ we have ir*j = r\ i.e. 7 £ V s (r\). Further note O (resp. for any 
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n 6 N 0„) the conditional probability kernel of 7 (resp. of 7„) with respect to 77 and set v := 7F+7. 
We now have to prove that for any t <G [0, 1] and for any A E Bt(S) which is such that v(dA) = 0, 
uj — > 0"(A) is measurable. From Lemma [T] we have that 

e n (A) -+ e(A) 

weakly in L 1 ^), and by hypothesis for any n E N Q n (A) is J 7 ^ measurable. On the other hand as 
a linear mapping which is strongly continuous in L l {rf), the conditional expectation E V [.\J-^} is also 
continuous for the weak topology cr(I/ 1 (7y), L°°(rj)). Hence 

E v [Q(A)\r>] = lim E n [e n (A)\r f >] = lim Q n (A) = &(A) 

which proves that to — > Q UJ (A) is J 7 ^ measurable so that 7 6 T > ci r i)- This latter is closed, the proof 
is complete. □ 



5. Monge-Kantorovich problems for causal transports 

In this section we define the causal counterparts to the Monge-Kantorovich (resp. to the Monge) 
problems (Definition [2]) and we prove the existence of a solution to these former under very weak 
assumptions (Theorem [3]). Although the general study of the dual formulation to these problems 
as well as the precise geometry of these optimal causal transports goes far beyond the scope of this 
paper, a more detailed study of the optimal plans will be given in the particular case investigated 
in Section [5] 

Consider 77 G V(E) and v e V{S) where E and S still denote the Polish spaces of Section [2J In 
view of the causal case we recall that both of them are assumed to be endowed with a filtration of 
their Borel sigma-field. We first recall the definition of the problems of optimal transport and we 
refer to [47j and |48] for a general overview on this topic. Informally a problem of optimal transport 
is the search of the cheapest way to transport the mass 77 to the mass v. As it was recalled in 
Section [21 one may think informally to any 7 € n(?y, v) as a way to transport a distribution of mass 
77 to a distribution of mass v. Problems of optimal transports can be defined once a measurable 
function 

c : E x S -)• IU {00} 

is given. This latter is called the cost function because for a given (x,y) <E E x S the value c(x, y) 
may be thought of as the being cost to bring an element located at x £ E to a y € S. The cost of a 
given transference plan 7 € ?(£ x 5) is then naturally defined by 

c(x,y)d-y(x,y) 

ExS 

a Monge-Kantorovich problems consists in finding a 7* which attains the infimum of the right hand 
terms in 

(5.18) T{v\q) := inf f ( / c{x,y)d 1 (x,y) 7 G 

\ Uexs 

In the case where E — S with a distance d E , and c(x,y) — (d E (x,y)) p for a p & [l,oo), (77, v) — > 
W p (r), v) := (J~{v\rj))v is called the Wasserstein distance of order p. Whenever 7* and is of the shape 

7* = (I E x T)*t? 
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where T e 7^(77, v) (see Section^), the mapping T obviously solves the so called Monge problem 



inf 



c(x, X (x))dr](x) 



Similarly, the following defines the causal counterpart to these problems : 

Definition 2. To any measurable function c:£xS->IU 00, and any 77 G V(E) and v G V(S) 
we associate the following problems ; 

(1) (Causal Monge- Kantorovich problems) By a causal Monge- Kantorovich problem we mean a 
variational problem of the shape 



inf 



c(x,y)d-y(x,y) 



ExS 



7 e n c (?7,i/) 



(2) (Causal Monge problem) By a causal Monge problem we mean a variational problem of the 
shape 



inf 



c{x, U{x))rj{dx) 



U€K a (r),v) 



Note that by taking B t {E) = B(E) and B t (S) = B(S) for any t e [0, 1] (see Section [2]), the usual 
optimal transport problems appear as some particular cases of such problems. Moreover whenever 
the optimum to the causal Monge-Kantorovich problem exists and is a deterministic coupling plan, 
it induces a solution to the causal Monge problem. We now provide a general result of existence for 
these causal problems in the general setting : 

Theorem 3. Let r\ E V{E) and v £ V{S) and let c : E x S — > M U {00} be a lower semicontinuous 
function such that c(x,y) > for any (x,y) G E x S. Then there exists an optimal 7 £ n c (?y, v) 
which attains the infimum 



(5.19) 



inf 



c(x,y)dr/(x,y) 



ExS 



7 e n c {r/,v) 



Proof: It is well known that the hypothesis on c yield the lower semicontinuity of 

jeV(ExS)^ [ c(x,y)d 1 (x,y) 

J ExS 

with respect to the weak topology of probability measures (see Lemma 4.3 of [IE])- On the other hand 
we proved in Theorem [2] that n c (?y, v) is a non empty compact set. Since any lower semicontinuous 
function on a compact set attains its infimum, this proves the result. □ 



Corollary 1. Let r\ 6 V(E) and let c : E x S — > f U{oo} be a function which satisfies the hypothesis 
of Theorem^ For any v G V(S) we note 



(5.20) 

Then the function 



S(v\r]) := inf ( \ / c(x,y)dj(x,y) 
\ UexS 



7 G U c (r], v) 



v G V(S) -> S(v\r)) G R+ U {00} 
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is a convex function which is lower semicontinuous for the topology of weak convergence of probability 
measures. In particular letCs be any not-empty compact subset ofP(S) (for the topology of the weak 
convergence of probability measures), then the infimum of 

(5.21) in{({S{v\ri)\v€C s }) 

is attained. 

Proof: We first prove the convexity. Let v, v G V{S) and A G [0,1]. By Theorem [3] there is a 
probability 7" G n c (?y, v) (resp. j u G II c (ry,i7)) which attains the infimum S(v\rj) (resp. S(v\r))) of 
the associated causal Monge-Kantorovich problem. By definition of the causal transference plans, 
for any A G [0, 1] A7" + (1 - X)Y G II C (?7, Xv + (1 - X)v). Hence by definition 



S{\v+(l-\)v\rj) < A / c(x,y)d 1 v (x,y) + (l-X) / c{x,y)drf{x,y) = XS{v\q) + (1-A)«S(%) 

JExS JExS 

We now prove the lower semicontinuity. For Aelwe note 

G A := {v € V{S)\S{u\rj) < A} 

the associated level set, and we consider a sequence (y n ) C V(S) which converges weakly to a 
v G V(S) and which further satisfy v n £ Ga, for any n € N. By Theorem [31 for any n £ N there is 
a 7„ € n c (?7, i/ n ) such that 



Sivnlv) = / c(x,y)d-y n (x,y) 

JExS 

Since (u n ) is tight, by the (v) of Theorem[2l and by continuity of tt+ , we can extract a subsequence 
(lk(n)) of (7n)> which converges to a 7 G n c (?7, j/). Hence by definition of S(.\rf) we obtain 



(5.22) S(y\ri) < / c(a, ^^(aj, y) 

JExS 

On the other hand as it was recalled in the proof of Theorem [3j 7 — > J ExS c(x,y)dj(x,y) is lower 
semi continuous so that 



(5.23) / c(x,y)dj(x,y) < liminf / c(x,y)d-y k ( n - ) (x,y) = liminf S(v k ^ n) \rj) < X 

JExS JExS 

By gathering ([02"]) and |gJ3J be obtain 

s(y\v) < a 

which proves that the levels sets Ga are closed. Hence S(.\rj) is lower semicontinuous and convex. 

□ 



Remark 1. By definition 

(5.24) S(v\ri) > T{v\rj) 

and the quantity S(v\rf) — T(v\ri) may be seen informally as the fair price to pay at t — to buy 
the whole information contained in B(E). In Section^ we will see that in the case of the optimal 
transports of the Wiener measure for the quadratic cost, this latter inequality is exactly the Talagrand 
inequality. 
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6. Causal transport problems in stochastic optimal control 

In this section we will see that causal transference plans naturally fit in stochastic calculus, and 
we will show how the problems of Section [5] are involved in the problems of stochastic optimal control 
(see investigated in Mikami's papers. As a matter of fact these connexions already appear very 
clearly in the lines of the proof of the Yamada-Watanabe criterion (see [18]) were the definition of 
causal transport plans appears implicitly. For the sake of simplicity as well as for our applications 
in Section [8] we focus on the space of the continuous paths W := C([0, l],R d ) which is defined in 
Section [21 and we note \i the law of the standard R d — valued Brownian motion, i.e. [i is such that 
the coordinate process (t, lj) G [0, 1] x W —> Wt(w) € K d is a Brownian motion. We recall that we 
note (B t {W)) the filtration generated by the coordinate process, and that Bi(W) = B(W). 

Proposition [2] focuses on the case where E = W := C([0, l],R d ) (see Section[2|), endowed with 
the filtration generated by the coordinate process i.e. (Bt(E)) — (Bt(W)). It shows that causal 
couplings are usually handled in stochastic optimal control and appear explicitly in the definition of 
a solution to stochastic differential equations. 

Proposition 2. We take E = W := C([0, l],K d ) endowed with the filtration generated by the coor- 
dinate process (Bt(W)), while S still denotes the general filtered Polish space of Section^ Further 
consider a complete probability space (fi, A, V) with two measurable mappings 

X S 

and 

Y : fi ->■ W 

and assume that t —¥ Y t is a Brownian motion with respect to its own filtration. Then the following 
are equivalent : 

(i) (Y x X)*V G n c (y*P,X*P) i.e. (Y,X) is a causal coupling. 

(ii) t Yt is a a(Gf X \JQj)— Brownian motion, where (Gt) (resp. (Gj) ) is the filtration generated 

by X (resp. by Y) defined in Section^ 
(Hi) There is a filtration (At) such that t — > lj is an (At)— Brownian motion and t — > X t is adapted 

to (At) 

(iv) Assertion (Hi) holds for the space (W x W, (J 7 ^ '), 7) for the process t — > W t ° 7r and t — y Wt ° tt , 
where (J~t) is the filtration on the product space defined in Section^ 

Proof: By taking (At) := (o~(Gt U*?* )) we see that (ii) implies (Hi). Conversely we assume that (Hi) 
holds for a given filtration (At). Since both X and B are (At)— adapted we have o(Q? U Gj) C A u 
for any t £ [0, 1]. Thus, if Y is a (-4*)— martingale for any s < t we have 

E v [X* - Y:\a(G? U GJ)] = E v [Bp [Y? - Y:\A S ] \a(G? U Gj)} = 

By Levy's criterion this proves that (Hi) <=> (ii). We now assume that (ii) holds and we want 
to prove (i). We set p t : ui € W — > w. A t G W to be the coordinate process stopped at t and we 
define t : uj G W —> l.>t{W. — Wt) G W. For a given t we now consider a set C of the form 
C = Pt 1 (A l ) n 9t 1 (A 2 ) for Ax,A 2 G B(W), and a A G B t (S). By hypothesis (6 t o Y)~ 1 (B(W)) is 
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V— independent of cr(Gt U St) an< ^ °f &t so that we S e ^ 

E v [l A oXl c oY] = Ep[l A oXl p - liAi) oYl A2 o(6 t oY)} 

= E v [l A o Xl p -i (Ai) o Y]E v [l M o {fi t o y)] 

= Ep[Ep[i A o x|g ( y ]i p - 1(Ai) o y^u, o {9 t o y)] 
= e v [v{x G A|gf)i p - Vi) o y]^[u 2 o (0 t o y)] 
= g A|e t y )i p - 1(vli) o yi A2 o {e t o y)] 

= E V [V{X e A\gJ)l c oY] 

from which we obtain 

V{X e A\gY) =V{X e A\a{Y)) 

for any ^4 G Bt(W) and by applying Proposition [T] we get (i). Conversely, we assume that (£) holds. 
By Proposition Q] we get 

£ P [exp {i < X,Y t - y > R d)|cr(^ t x U Qj)] = E v [exp (t < A, Y t - Y s > Rd ) | Q Y t ] 

= ex P (-(i- S )^ 

which proves (m). The equivalence with {iv) is trivial. D 

Henceforth, we take E = S 1 = W both endowed with the filtration generated by the coordinate 
process (Bj(W)). The Definition |3] defines a class of problems of stochastic optimal control which is 
the paradigm of those investigated in Mikami's papers (for instance see p. 3 of [31]). The Proposition|3] 
relates these problems to the causal Monge-Kantorovich problems of Section [5] 

Definition 3. Let L : W 1 — >• M. + be a measurable function, and let Qo, Q\ G V{M. d ) be two Borel 
probabilities on Mr. Further note W\ the set of the ui G W which are absolutely continuous with 
respect to the Lebesgue measure i.e. the paths u> G W for which there is an integrable {d) s , s G [0, 1]) 
such that uj — J Q uj s ds. We define 



L{u s )ds 



( M ,X")gC,(X")^gS(Q ,Qi) 



(6.25) V{Q ,Q 1 ) :=inf UBp 
where 

(6.26) £(Go, CO = {y e V{W)\{W )*v = Qo, (W^v = Gi} 

and where C is the set of the pair of processes {u,X u ) defined on a complete filtered space [Cl,A, V) 
with a filtration (At) such that 

(i) We have V — a.s. u G W\ and in this case we note {ii s ) the associated density i.e. 

u*ds 



(ii) There exist two continuous processes {X U ,B) defined on that space such that t — > X t is 
{At)— adapted, t B t is a {At)— Brownian motion and V — a.s. 



X? = B t + I u s ds 
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Proposition 3. Let Q ,Qi G T(R d ), and let L, W x , F(Q ,Qi) be as in Definition^ We define 
the cost function c : W x W —> K U {00} associated to L by setting 

c(x, y) — I L(z s )ds 
Jo 

if z :— x — y G W\ and c{x, y) —00 if x — y G W/W\. Further consider the following variational 
problem 

(6.27) V(Qo, Qi) = inf ({S(v\fi)\v G S(Q , Qi)}) 

where S(i>\fi) is defined by 



S(v\[j,) :=w£[\ c(x,y)d-f{x,y) 

V UWxH' 



(6.28) 

and where S(Qo? 2i) * s given by &6.26}) . Then we have 



7 e n c (^) 



(6.29) V(Qo,Qi) = V(Qo,Qi) 

where V(Qo, Q\) is defined by 116.25)) 

Proof: Let tt* : 7 G V(W x W) -> 77*7 G 7>(W). By setting 

Xa(Qo, Qx) ■= Vf{p) n (^) _1 (S(Qo, Qi)) 

it is trivial to check that 



(6.30) 



V(Q ,Qi) :=inf ( { / c(x,y)dT(a;,i,) 



By Proposition [2] the result (|6.29l) directly follows from (|6.30[) and from the definitions. 



□ 



Remark 2. This remark anticipates on some results of Section^ In the case where 

m := ^ 



dv 
dfi 



we will see in Theorem [^] that 

(6.31) S(v\n) = H{v\n) := E v In 
Hence in this case \6.29\l reads 

(6.32) V(Q , &) = inf ({H(v\ij)\(Wo)*v = Q , (Wi)*v = &}) 

Whenever Qo << A and Qi << A ('i.e. absolutely continuous with respect to the Lesbesgue measure 
A on M. d ), the right hand term of \6. 32\) is a Schrodinger Bridges (see [14] . [49] . [51] h Hence 
we recover the (very) well known connexion between Schrodinger Bridges and Mikami's problems, 
which is the entropic formulation of the quantum euclidean mechanics (Q.E.M.) (See [14] together 
with [49] . [50] . [51] ). Thus, the novelty of Proposition^ is to show that the couplings of Q.E.M. are 
the result of a variation over optima to the causal Monge-Kantorovich problems on the product space 
of Section^ Finally, note that quantum Euclidean mechanics (Q.E.M.) is well known to model 
some physical systems of statistical mechanics, typically some spin chain under a thermal agitation. 
In the couplings (B,X), the Brownian t — > Bt then models the thermal effects and the optimum 
coupling of h6.31)) model the causal answer of the spin chain to the random chocks of many little 
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particles (modeled by B). For that reason the optimal couplings of Q.E.M. are physically expected to 
be deterministic couplings (see Section^. The proof of such facts is not trivial for marginals whose 
density is not smooth, and relies on the localization of Zvonkin's celebrated result (see [23] in the 
case of h — path processes i.e. when Qo — 8 X , x G M. d ). 

7. A brief reminder on the transformations of the Wiener measure 

To prepare Section [8] we are now going to recall some basic facts about Wiener spaces and of 
transformations of the Wiener measure. In particular we will recall the definition of a class of 
shifts associated to probabilities absolutely continuous with respect to the Wiener measure, which 
we call the Girsanov shifts. These shifts have a long history, which is deeply related to stochastic 
mechanics, and we refer to the references given below for further details. The second part of this 
section introduces the basic objects of Malliavin calculus which enable to compute explicitly the 
Girsanov shifts. This part is quite technical and the reader who is not already acquainted with this 
field may skip the last part of this section in a first reading. 

We now introduce some basic definitions related to the Wiener spaces (see [5D] for a detailed 
presentation of these spaces). We still note W the space C([0, l],R rf ) of the M. d — values continuous 
paths, i.e. the space of the continuous 

p : t e [o, l] -> P (t) e R d 

As it is well known W is a Banach space when it is endowed with the norm of the uniform convergence 
which we note \.\w (i-e. for any p G W, \p\w '■— su Pte[o,i] b(^)k d )- Hence it is turned into a 
measurable space thanks to its Borel sigma-field B(W). On the other hand the coordinate process 
(Wt) is defined by 

(i, uj) G [0, 1] x W -> Wt(w) := w(t) G K d 

In view of the preliminary Section [21 we take the filtration (Bt(W)) to be the filtration generated 
by the coordinate process, i.e. for any t G [0, 1] by 

B t [W) :=o-{W s ,s<t) 

Still to be consistent with Section [5] we note {J 7 ") the usual augmentation of (Bt(W))t£ip,i] with 
respect to any probability v G V(W). The Wiener measure which we note /z is the borel proba- 
bility under which this coordinate process is a Brownian motion. In particular, /i G V(W) is fully 
characterized by the property that for any s < t and any A S l 1 * 

E» [exp (i < A, W t - W s > V )\B„(W)] = exp (-(t - s)^^) 

The classical Wiener space is then the space W turned into a probability space by the Wiener 
measure /i. Another important space which will play a key role in Section [5] is the Cameron-Martin 
space H which is associated to W and /i. It is defined to be the set of the w G W such that 



(i.e. equivalent) where 
(7.33) 



t^: tueW^>-uj + uj€W 
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As a matter of fact this space is explicitly given by 

H = j/i G W,h = h s ds 
and that H is an Hilbert space for the scalar product 



\t d ds < oo 



< h,k >h-= / < h s , k s > Rd ds 
Jo 

whose associated norm is noted \.\h '■= \/< • , ■ >h- In this paper we extend \.\h as a function 
\.\h ■ W — > RU {00} by setting \uj\h — 00 for a cj ^ H. We now recall some basic facts concerning 
probabilities absolutely continuous with respect to the Wiener measure. Let v <E V(W) be such that 
v << fi (i.e. absolutely continuous). Borel's isomorphism ensures that both 1Z((J,, v) and 7?.(z/, fi) are 
always not empty and even contain an isomorphism of probability space. However we do not have a 
general theorem to know whether lZ a ([i, v) is not empty. On the other hand, the Girsanov theorem 
is well known to yield the following (for instance see [12], [13], [M], or [44]) : for any v g V(W) such 
that v « [i, there exists a unique V : W — >• W defined v a.s. and which further satisfies the two 
properties below 

(1) v-a.s. V-I w eH 

(2) (t,u) E [0,1] x W -> V t (u)) e M. d is a (J r t ,y )-Brownian motion on (W, F% ,v). 

In particular, note that V £ TZ a (iy, /i) (see Section [5]). In this paper we call V the Girsanov shift 
associated to v, and we call v := V — Iw the Girsanov drift of v. This latter is fully characterized by 
its integrability along with its ability to express the density. Namely v :— J Q v s ds satisfies v — a.s. 

(7.34) / \is\ldds < 00 

Jo 

and v — a.s. 

(7.35) = exp ( - f < v s ,dW s > Rd -~ / \v s \^dds 



ClfJ, \ Jo 2 

By the Girsanov theorem (see [5]), we know t — > Wt to be a semimartingale under v so that the 
stochastic integral which appears in (|7.35[) is well defined. Moreover, still by the Girsanov theorem 
it is equivalent to define the Girsanov drift v to satisfy (|T. 34[) and (|7. 35[) and then to define the 
Girsanov shift V by setting v — a.s. 

V := I w + v 

In terms of stochastic differential equations (ii) exactly means that (Iw,V) is a solution to the 
stochastic differential equation 

(7.36) dX t = dB t -v t o Xdt; X = 

on the space (W, T v ,v) with the filtration (J-jf). Together with Follmer's formula which we recall 
below, this latter formulation of (ii) explains the key role of these shifts in Euclidean quantum 
mechanics (see [511 for a pedagogical account on this). Concerning (|7.36j) . an important question is 
to know whether it has a unique strong solution i.e. to know whether there exists a U € TZ a (/i,u) 
such that for any measurable X : £1 — > W and Brownian B : — > W defined on a space (CI, A, V), 
(X, B) is a solution to (|7.36|) if and only if V — a.s. 

(7.37) X = U(B) 
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Generally, it is not the case as it is shown by Cirelson's counterexample (see [33] or [IB])- As a 
matter of fact, it was recently noticed by Ustiinel and Zakai (see [55], [35], [3D], [31], and also |22j . 
[23] ) that the existence of a unique strong solution to (|7.36[) is equivalent to the condition that V 
is an isomorphism of filtered probability space (see Section [2]). Of course in this case the inverse 
U G TZ a (fj,,v) of the Girsanov shift V G lZ a {v 7 fi) is the same as in (|7.37|) . In 40( a criterion of 
invertibility of V based on the relative entropy was provided which was extended in [21] , It states 
that for any probability v e V(W) with finite entropy i.e. H{v\[i) < oo where 



j dv 

dfi, 



(7.38) H{y\n) = E v 
and for any U € TZ a (fi, v) we have 

(7.39) 2H(u\n) < E^U - I w \ 2 h ] 

with equality if and only if V is an isomorphism of filtered probability spaces with inverse U. This 
latter relies on the celebrated formula of Follmer ([12]. [13] , [14] ) which states that 

(7.40) 2H(v\n) = E V [\V - I w \%] 

We now introduce the Malliavin derivative which provides an explicit computation of the Girsanov 
shift V of a probability v which is absolutely continuous with respect to the Wiener measure. This 
formula, which may be well known even in the case of absolute continuity is recalled in (|7.42l) . 
Malliavin calculus and the explicit expression of the shifts will only be involved in few results of 
Section [8] while it is rather technical. For that reason, we encourage a reader who would not be 
familiar to this topic to skip the end of this section in a first reading, and to go directly at Section [8] 
We only recall the basic definitions and results necessary to express the Girsanov drift and we 
refer to [25], [37], [38], [32], [18], or [32] for an overview on this topic. The Malliavin derivative 
V extends the finite dimensional Sobolev derivative with respect to the Gaussian measure, to the 
infinite dimensional space W endowed with the Wiener measure. We denote by V i the set of the 
smooths functionals F : W — > R of the shape 

F = f(W tl ,...,W tn ) 

where / : R™ — > R is a polynmial and where ti,...,t n £ [0,1] for a n € N. By definition, the 
translations Th (see (|7.33p ) along any directions h := J h s ds G H of the Cameron-Martin space 
are quasi-automorphisms of the Wiener space (i.e. Th*(J> ~ (J-)- This latter fact ensures that for any 
F e Vol, the following Gateaux derivative is well defined fi — a.s. 

d " 
(7-41) V h F := -^\x=oF°rx h =^2(d i f)(W tl ,...,W tn )ht i 

We define \7F to be the H— valued random variable given by fi — a.s. 

n „ 

VF:=y2(djf){W tl ,...,W tn ) / l[o,t,](*)d« 

so that fi — a.s. 

\7 h F =< \7F,h> H 

where h £ H. Thus we obtain a linear operator V : Vol C L 2 (/i) — > L 2 (//, H) (the set of the mapping 
X : W — > H defined \i— a.s. and such that ^[jXl^] < oo). Thanks to the Cameron-Martin theorem 
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it is easy to see that although V is not a closed operator, it is however closable. We still denote by 
V : Dom 2 (V) C L 2 (fi) -> L 2 (fi, H) the closure of V : V a i C L 2 (fi) -> L 2 F). We recall that the 
closure of V is defined in the the following way : Dom 2 (V) is defined to be the set of the F £ L 2 (//) 
for which there is a sequence of cylindrical random variables (F n ) nefi C Vol with the property 
that lim n _ ! . 00 F n = F in L 2 (fi) and Vi 7 ^ is Cauchy in L 2 (fi,H). Thus for any F £ Dom 2 (V) we 
can define VF := lim n _ i . 0O V-F„ which is unique since V is closable. By construction Doto 2 (V) is 
the completion of V i with respect to the norm of the graph associated to V which is defined by 
ll-^IU.i = ||-F 1 ||l 2 (^) + llVi^l^^ H ) and we note ^D 2 ,i the Banach space Z)om 2 (V) endowed with the 
norm ||.|| 2j i. Of course V is nothing but the infinite dimensional version of the Sobolev derivative 
with respect to the Gaussian measure, and D 2 i is the Sobolev space associated to this infinite 
dimensional weak Sobolev derivative. We now turn to the definition of its adjoint 8 which is closely 
related to the stochastic integral. It is easy to see that V po i is dense in every L 2 (/i, E) (see [18] for 
a short proof). Since V \ C ID 2 i, the operator V : Dom 2 (V) C L 2 (p) — > L? (fi,H) has a dense 
support. Therefore there is an operator 8, the so called divergence, which is the adjoint of V. The 
domain I?om 2 (5) is defined classically as being the set of the random variables £ £ L 2 ([i,H) such 
that for any £ ^D 2 s [< V</>, £ >h] < c 2 (\</>\l 2 (ij,))- F° r an Y £ € Dom 2 (8), ££ is characterized by 
the relation 

E ll [4>8^]=E IJb [< V0,£>h] 

which holds for any <fr £ Dom 2 (V). Of course, this relation is the infinite dimensional counterpart 
of the integration by part with respect to the Gaussian measure. Let L 2 (p,,H) be the subset of 
L 2 (n^H) whose elements are adapted to (J 7 ^). By applying the definition (I7.4ip to smooth simple 
process, and then by taking the limit, it is not difficult to check that L 2 ([i, H) C Dom 2 (8) and that 
for any u £ L 2 (n, H) we even have 

Su = ii s dW s 
Jo 

which is the stochastic integral with respect to the coordinate process which is Brownian under 
fi. Since L 2 (fi,H) is a closed subspace of the Hilbert space L 2 (fi,H) we can define the associated 
projection n^. Together with the martingale representation theorem which ensures the density of 
{6u,u £ L 2 (fi,H)} in {F — E^[F],F £ L 2 (fi)}, after a short calculation the integration by part 
yields the celebrated Clark-Ocone formula : for any X £ TD 2y i we have 

X = E^Xj+S^X 

By noting (D S X) the density of VX with respect to the Lebesgue measure which is defined such 
that fi — a.s. 

VX = D s Xds 
Jo 

the Clark-Ocone formula reads /i — a.s. 

X = E^[X}+ f E^[D s X\F»]dW s 
Jo 

for any X £ © 2 .i. Although the following is well known we recall the proof for the convenience of 
the reader : 



CAUSAL TRANSFERENCE PLANS AND THEIR MONGE-KANTOROVICH PROBLEMS 



2:S 



Proposition 4. Assume that v is a probability which is absolutely continuous with respect to the 
Wiener measure, and note V := Iw + v (resp. v — J Q v s ds) its Girsanov drift (resp. shift). Then 
we have dt x dv a.s. 



(7.42) 



v, = —E„ 



D„ In 



dv 
dp. 



J'- 

u a 



Where D s still denote the density of the Malliavin derivative. Otherwise stated, by noting ~k v the 
orthogonal projection on the subspace of L 2 (v, H) whose elements are adapted, we have 

v = —ir V In — — 
djjL 



and 



V = I w -7r"Vln 



dv 

dp. 



Proof: The Clark-Ocone formula yields p, — a.s. 



dv ^ 

dp, 



E„ 



D s — 

dp, 



TV- 
•'t 



dW x 



Since v — a.s. > 0, it is straightforward to check that we obtain v — a.s. 



dv 
d/.i 



i E„ 



o E u 



•r i 



■dW x 



Moreover Baycs formula reads dt x dv — a.s. 



E v 



D s In — 
an 









E, 







On the other hand, the Girsanov theorem implies that t — > Wt is a J 7 ^ semimartingale. Hence Ito's 
formula yields v — a.s. 



(7.43) 



dv 
</// 



= exp 



E v 



DM 



dv 
d/i 



dW s 



1 



E v 



D„ In 



dv 
dp. 



u a 



As any continuous martingale with finite variation vanishes, the equations (|7.35|) and (|7.43j) directly 
yield the desired result. □ 



Remark 3. We note here that just has the derivative in the sense of distributions extend the Sobolev 
derivative, it is possible to build distributions on Wiener space and to extend the derivative and the 
divergence to these so-called Watanabe's distributions (see 18 J. By means of this derivative, the 
Clark-Ocone formula can be extended to any X € L 1 ^) (see [42] j and for that reason it is reasonable 
to think that the Proposition ^ which expresses the Girsanov drift in terms of Malliavin calculus, 
can be extended to encompass any density. However, since our aim in Section is mainly to stress 
the analogies with non causal optimal transport we have limited ourselves to the Sobolev case for the 
sake of simplicity. 
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8. Stochastic differential equations as optimal transport problems 

In Section [5] we emphasized that any weak solution (X, B) to a stochastic differential equation is 
a causal coupling. In Theorem [4] we will show that under slight conditions, weak solutions to 

(8.44) dXt = dB t - v t oX;X = 

represent an optimum to a causal Monge-Kantorovich problem. Moreover in Corollary [2] we state 
that (|8.44[) has a unique strong solution if and only if this Monge-Kantorovich solution is also a 
solution to the related causal Monge problem. In particular, this latter shows that the problems of 
existence of a unique strong solution to (|8.44[) are related to the geometry of optimal causal trans- 
ports. We then provide a dual formulation to these problems in Corollary[U All these results extend 
easily to any initial condition, and to the Brownian Sheet with the same proofs (see [23]). Moreover 
apart from the dual formulation, these results extend to any abstract Wiener space endowed with 
the filtration induced by some continuous resolution of the identity (see [43] and [21]). However, 
since this section is mainly aimed at motivating the problems of causal optimal transport, it seams 
more relevant here to work in a framework as simple as possible. Let us note that in the case of 
a probability equivalent to the Wiener measure, Theorem 2] already appeared under a very closed 
form in [23], and that Corollary [2] is an alternative formulation to Ustunel's criterion (see Section [7] 
equation (|7.39|) ) which appeared in [21 . However in these cases the connexion to causal transport 
problems on the product space did not appear at all. Moreover, the Corollary [4] which provides a 
dual formulation to these problems is completely new. As the proof we provide here shows it, the 
structure of causal transference plans plays a key role in these results, and these latter considerably 
shorten the proofs. 

The two following propositions recall several well known facts which we will use in the sequel. We 
provide here a compact proof by means of causal transference plans. In particular it is worth to note 
that these results directly follow from the properties of the Girsanov shift V and of the structure of 
causal transference plans. 

Proposition 5. Let v be a probability absolutely continuous with respect to [i and let V (resp. 
v = J v s ds := V — Iw ) be its Girsanov shift (resp. drift) which was defined in Section^ Then 

(8.45) {I w xV)*v£n c (v,n) 
and 

(8.46) (V x I w )^ eH c (fi,iy) 

In particular for any A/B(W) measurable mapping X : fl — > W defined on a complete space (fl,A,V) 
and such that X+V — v, we have that V o X is a (Q^)— Brownian motion where (Gf ) denotes the 
filtration generated by X . 

Proof: Since V is (J r i I/ )-adapted (see Section [7]) and V+v = fi, (I8.45|) directly follows from the 
definitions of the set H c (v, /i) of the causal transference plans from v to /i. On the other hand, as 
we have recalled it in Section [7] the process t — > Vt is well known to be a (J 7 ?)— Brownian motion so 
that it is also a (J 7 ^ U GY)~ Brownian motion. Hence (|8.46[) directly follows by Proposition [5] Now 
let X be defined on a space (f2, A, V) with the further property that X+V — v. By definition of the 
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image measure (|8.46l) yields 

(V o X x X)J> = (Vx I W )4X*T) = (V x w G n c (/x, v) 

Hence, by Proposition [2j V o X is a (cr(Gt U ^t' ^)) Brownian motion. On the other hand since 
V is adapted, by considering the filtration generated by X as an inverse image (see Section [2]), it 
is straightforward to check that V o X is adapted to (G*)- These two facts mean that V ° X is a 
(0^ )-Brownian motion. □ 



Remark 4. T/iis proposition directly yields the following. We recall that the sets 7Z{p., v) and 
lZ a (li,i>) were defined in Section^ as well as the isomorphisms of (filtered) probability space. Let 
v G V{S) and U G lZ(fJ,,v) be an isomorphism of probability spaces with inverse V G lZ(v,fi). Then 
we have U G lZ a (fi, v) if and only if V is a (<j(GY U J 7 "))— Brownian motion. Moreover still in this 
case U is an isomorphism of filtered probability spaces if and only if V is a (J 7 ^)— Brownian motion. 

The following proposition, which will be useful in the sequel, recalls a result of basic use in 
stochastic control. As the proof shows it, this latter expresses the constraints of causal transference 
plans for the set II c (/i, v) in the case where H(y\fi) < oo. 

Proposition 6. Let v be a probability absolutely continuous with respect to \i with finite entropy 
(i.e. H(v\n) < oo) and let V (resp. v — J Q v s ds := V — Iw) be its Girsanov shift (resp. drift) 
which was defined in Section^ Further concider two continuous processes (X, B) defined on a same 
complete probability space (Q, A, V) which satisfy : 



and 



E v [\X-B\%] <cx. 
{B x X)J> G n c (/x,^) 



Then ds x dV — a.s. we have 

(8.47) E v [u„\Gf] - -v s oX 
where 

u:=X-B 

and where (Gt) denotes the filtration generated by X . 

Proof: Consider a pair of processes (X, B), as defined in the claim, which is such that (B x X)*V G 
II c (/i, v). Let (8 S ) be any process defined v— a.s., which is adapted to (J 7 ^) and which further 
satisfies G L 2 (v,H) i.e. 

(8.48) E v ' 



2 ds 







< OO 



In particular by setting 8 := J Q 8 s ds we have 8 G L^{v,H) which is the closed subspace of the 
(J-f)— adapted elements of the Hilbert space L 2 (i/,H). On the other hand, since X±V = v Propo- 
sition [5] shows that the process t — > V t o X is a (G*)— Brownian motion under "P, while (|8.48p now 
reads 

E'p 



l 

10. o X\l d ds 



< oo 
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Since s 
(8.49) 



X is adapted to (Qf), from these two latter facts we obtain 



<e s ox, d(v s o x) >! 



Moreover by Proposition [2j B is a [cf(G* U Gf)) — Brownian motion. Hence, similarly we get 



.50) 



E-p 



<9 s oX, dB s > 



uo 



= 



By linearity (I8.49|) and (|8 . 50[) yield the result. Indeed for any 6 g L\{y, H) we obtain 



E v [< u,9 o X > H ] = 



E v [< V o X - B - v o X, 6 o X > H ] 

= 3p\ f <9 s oX, d(V s o X) > 9d E, 
Jo 

... -3p [< voX,6oX > H ] 
= -Ev [< voX,6oX >„} 



< 9 S o X, dB s >! 



which is the result. 



□ 



Theorem 4. Let v £ 7 , (M / ) &e a •probability absolutely continuous with respect to /i, whose Girsanov 
shift (see Section^ is noted V. Further assume that v has a finite entropy with respect to the 
Wiener measure (i.e. H(y\jj,) < oo). Then we have 

(8.51) 2H(u\n) = inf ( ( / \x - y\ 2 H d-y(x, y) 7 e U c (fi,u)\ 

\UwxW J 

Moreover the causal Monge-Kantorovich problem defined by the right hand term of h8.51\) has a 
unique solution 7* which is given by 

(8.52) 7* = (V x I w )*v 

i.e. the optimal Monge-Kantorovich causal transference plan 7* is the joint law of the solution 
(Iw,V) to the stochastic differential equation 

(8.53) dX t = dB t - v t o Xdt; X = 

where v := V — Iw = J v s ds is the Girsanov drift of v (see Section^. Moreover, if we further 
assume that 

dv m 
dfi, 

then 

(8.54) 7* = {(I w - tt^V^) x I w )*v 
where 



dv 
djjL 



id where ix v and V are defined in Section [?] 
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Proof: Let 7 6 n c (/i, v) and let (X,B) be two processes which realize 7 on a complete probability 
space (fl,A,V) with a complete filtration (At) i-e. £? : il — > W is a (At)— Brownian motion and 
X : — > W is adapted to (-4*). By Proposition[2]we know that such a realization of 7 always exists. 
We further set 



(8.55) 

If the following condition 
(8.56) 



X-B 



Er 



< 00 



doesn't hold, then we have the inequality in (|8.51[) . Henceforth we assume that (|8.56p holds. Note 
that form (|7.40|) . the finite entropy condition reads 

r-l 



.57) 



< OO 



On the other hand by definition 

VoX-B = voX + u 
and X±V = fi. Hence by Proposition [S] we obtain 

< E v [\VoX-B\ 2 H ] 

so that by (fT40)l and (|8?55|) we get 



Ev [V 

2H(v\(i) < E V [\X - B\ 2 H ] 



Eu[\v\ 2 H ] 



with equality if and only if V 

i.e. if and only if 
(8.58) 



V o X = B 



(B x X)J> = (V x I w )*v 



Since (B x X)±V = 7, this is exactly the desired result. Finally, in the particular case where 
^ G D 2 ,i, the explicit formula I^M^i follows from (|7T32|) □ 

Remark 5. • This theorem extends easily to the Brownian Sheet or more generally to any 

abstract Wiener space on which a time structure is provided by a continuous resolution of the 
identity as it was done in |21j for the Monge problem. It can also been extended to stochastic 
differential equations with dispersion whose solutions can be obtained by transformation of 
the drift (see |18) ), and with an arbitrary starting point at a given x £ K d . 

• By using the Remark B.l of |10) the (i) of our theorem can be used to extend the weak Boue- 
Dupuis formula to any measurable f : W — > K U {00} such that /i ({ui £ W\f < 00}) > 
and f\f\e~fd/i < 00. Similarly the Corollary fj| below enables to extend the Boue-Dupuis 
formula (see [1] or [2]^ under the same hypothesis. 

• Note that whenever the weak uniqueness of solutions holds for 1(8. 53\) (i.e. for any (X,B) 
which is a solution to \8. 53\) on a space (Q,A,V) we have X*V = v) then 7* represents any 
weak solutions (X,B) to this equation i.e. (B x X)+V = 7*. In particular it is the case 
whenever (v s ) is defined fi — a.s. and is uniformly bounded (see |18j ). 
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In [21] we provided a variational reformulation which extended the main result of [3D] to the case 
of absolute continuity. The following Corollary shows that these results enable to formulate the 
problems of existence of a unique strong solution in terms of causal optimal transport problems. 

Corollary 2. Let v G V(W) be a probability absolutely continuous with respect to fx, and assume 
v has a finite entropy (i.e. H(v\pL) < oo). Further note 7* the unique solution to the causal 
Monge-Kantorovich problem defined by the right hand term of i8.51\) whose existence is ensured by 
Theorem [JJ Then the following assertions are equivalent : 

(i) The support of the Monge-Kantorovich solution 7* is concentrated on the graph of a U G 
TZaifJ-,^) (i.e. 7* — (Iw x U)+fi) which is solution of the causal Monge problem 

inf Hj \A(x) — x\ 2 H A G TZ a (fi, v) 

(ii) The Girsanov shift of v, which we note V := Iw + v (see Section^), is an isomorphism of 

filtered probability space (see Section^) with inverse U. 
(in) There is a U G TZ a (fJ-, v) such that 

2H(v\fx) = E^U - I W \ 2 H ] 

(iv) The equation 18.53]) has a unique strong solution 

In this case all these U are the same. Moreover still in this case, if we further assume that £ G E> 2 ,i 
we also have 

(8.59) 7" = (Iw x (Iw + (n v V<t>) ° U))^ 

where (f) = In ^ and where ■n v was defined in Section 

Proof: As mentioned in Section it is elementary to check the well known equivalence (iv) <=> 
(Hi). On the other hand the Theorem [4] states that 7* is the unique solution to the causal Monge- 
Kantorovich problem and that we have 

7* = (V x I w )*v 

where V is the Girsanov shift (see Section [7]) of v. This ensures that (i) occurs if and only if there 
is a U such that 

(Iw x L/)*^ = (V x I w )*v 

which is obviously equivalent to (ii). By (|8.51l) . the uniqueness of the solution to the causal Monge- 
Kantorovich problem yields the equivalence of (Hi) with (i). We now assume that these conditions 
are satisfied and we prove the explicit formula in the case where ^ G JD 2l i. Since U is the inverse 
of V :— Iw + v we have 

U = Iw — v oU 

so that (|8.59[) follows from Proposition @] □ 



The next Corollary investigates some variational problems on the Wiener space which seam to be 
new, and are somehow reciprocal to the problems related to stochastic differential equations. Note 
that thanks to Proposition [T] these problems can be formulated in terms of pair of processes. 
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Corollary 3. Let is be a probability which is absolutely continuous with respect to pi and whose 
entropy H(v\fi) is finite. Then we have 



(8.60) 2H(v\n) >inf ({J 



\ x ~ y\ 2 H d~f(x,y) 



WxW 



7 e U c (u,fj,) 



Moreover the causal Monge-Kantorovich problem defined by the right hand term of \8. 60)) is attained. 
Proof: By Proposition [S] we know to have 

(i w x v)±v e n c (z/,^) 

Thus by using the symmetry of the cost function together with (|7.40|) we obtain 



2H{y\n) = E v [\I W ~ V\%] > inf 



\z - y?H<h[x,y) 



WxW 



7 e n c (^, n) 



Finally, the existence of an optimal plan follows from Theorem [3] 



□ 



To prepare the dual formulation of the causal Monge-Kantorovich problem of the Wiener measure, 
we now set 



\ x - y\H d i( x > v) < 00 



WxW 



Definition 4. Let II;? (/1, v) be the set defined by 

(8.61) IL^) := { 7 en c (^) 
and let 

(8.62) u : (x,y) € W x W -> u(x,y) := x- y € W 

For any 7 £ Ii 2 c (fx,v) we also denote by (u) the density of u with respect to the Lebesgue measure 
i.e. such that 7 — a. s. 

u := / ii„ds 
Jo 

The cost-rate (cj) 0/7 is then defined to be the process given dr/ ® ds by 

(8.63) cl := \u s \ 2 Rd 
so that 7 — a.s. 



\ x ~v\h 



elds 



Corollary 4. Letv be a probability with finite entropy with respect to the Wiener measure (H(v\n) < 
00). Then we have 



(8.64) 



2H(u\ii)=sup({ f fdu+ [ gdn 
\ Uw Jw 



(/,<?) ee. 



where G c is the set of the (/, g) 6 L 1 ^) x L (fi) with the property that for any 7 <E v) 
(see US. 61]) ) we have 7 — a.s. 



(8.65) 



fon + gOTT< 



ds 



where (cj) is defined by i8.63)) . and where (Gt) * s ^ e filtration generated by t — > Wt ° 7T. Moreover 
the optimum is attained. 



:S0 
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Proof: Let 7 G n^(^, v) and let u G ^0(7, i?) be the mapping which is defined by (|8.62[) . Then for 
any (/, g) G O c the condition (|8 . 65[) yields 



/ fdv + / gdfi = E 1 [/ o 7T + g o tt] 
iff iff 



a.: 



\x ~ y\ 2 H d-y(x,y) 



WxW 



by taking the infimum over n c (^, v) we obtain the inequality 



(8.66) £„[/] + £^[5] <inf (I J 



\ x ~ y\ 2 Hdl{x 7 y) 



WxW 



7 g n c (/x,i^) 



for any (/, g) € C . We now recall that as a direct consequence of Proposition [2] the process 
t — >• Wt o 7r is a (^7)— Brownian motion under 7 while by definition of (J 77 ) ( see Section HJ, the 
process t — > Wt o n is adapted with law Hence Proposition [6] reads ds x dj — a.s. 



(8.67) E 1 \ii s Q1 
Now, we set / = \v\ 2 H so that (|7.40p reads 

(8.68) 2ff(i/|/i) = £„[/] 

On the other hand, together with Jensen's inequality, equation (|8.67p yields that for any 7 G n^(/x, ^) 
we have 7 — a.s. 



/OTT = |wOTt| h 

£7~ 



< 



E n 



ds 



ds 
ds 



so that (/,0) G C . By (f8H6j) and (|8T68|) . the result follows directly from Theorem g] 



□ 



The following shows that the previous result gives trivially the Talagrand and the log-Soloblcv 
inequalities (see [38], [34] and [E]) within this framework. Although the proof appeared somewhere 
else (in |23j . |40) ) . it was not written explicitly in the form of optimal transport until now. Moreover 
by using causal transport problems, this very short proof suggests that one may think to the differ- 
ence between the relative entropy and the Wasserstein distance as being the price to pay to buy all 
the information contained in B(W) at t = 0. 

Corollary 5. Let v be a probability which is absolutely continuous with respect to the Wiener measure 
and let 

d 2 {v,(i) =infN / \x-y\%drr(x,y),'yell(n,v) 
\ Uwxw 



) 
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be the square of the Wasserstein distance. Then we have 
(8.69) d 2 {v,v) < 2H(v\n) 



Further assume that ^ e JD 2 ,i, and set 



(8.70) 

then 
(8.71) 



J{v\n) := E v 



„ , dv 
Vln — 



H 



2H(u\fi) < J{v\n) 



Proof: Since TI c (jjl,v) C Ti{n,v), Theorem [4] yields (|8.69|) . On the other hand, we obtain (|8.71[) by 
applying Jensen's inequality together with Proposition 01 □ 
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